Tracking multimodal interaction with new media

نویسنده

  • Jana Holsanova
چکیده

From a cognitive perspective, this paper summarises a number of theoretical and applied studies conducted by my colleagues and myself on the topic 'interaction with new media'. The focus lies on the users' behavior: visual information gathering, interaction with the multimodal interface, browsing strategies and attentional processes during hypertext navigation. In addition, we also look at users' expectations and attitudes towards the medium. There are several methods that can be used in order to describe user behaviour and postulate a number of underlying cognitive mechanisms (Holsanova 2004). In the following, I will show how eye-tracking data supplemented by simultaneous or retrospective verbal protocols, keystroke logging, and interviews can help us to investigate users' behavior, the rationality behind this behaviour, and users' attitudes and expectations. User behavior on the Internet The first study conducted 1996 by David de Léon and myself was a qualitative study of user behaviour on the World-Wide Web concerning hypertext navigation and browsing strategies. Eight participants were filmed whilst performing user-defined tasks and then asked to review the video-taped session during prompted recall. This data formed the basis for a series of descriptions of user behaviour and the postulation of a number of underlying cognitive mechanisms. Our results indicate that users lack ready made search strategies, prefer alternatives that are visible, immediately available and familiar, choose the path of least resistance, exhibit social forms of behaviour, engage in parallel activities, object to misleadingly presented information, have trouble orienting, are late in using appropriate strategies, are sensitive to matters of time, and are emotionally involved in the activity. Finally, we discuss how these results can contribute to our understanding of hypermedia (for details see de Léon & Holsanova 1997; this paper as well as most of the following papers mentioned can be downloaded in PDF format at http://lucs.lu.se/jana.holsanova). Eye movements and attention In the following studies, we wanted to follow visual scanning and attentional processes more exactly and thus started using an eye tracker. The motivation of eye movements is to bring to a particular portion of the visible field into high resolution so that we can see it in fine detail. When we focus our concentration and eye movements on a point, we also divert our attention to that point. Data on visual behaviour can thus be used as measures of cognitive processes. Eye tracking allows us to registrate precisely what is fixated and when. We can follow along the path of attention deployed by the observers and get insights into what the observers found interesting, what drew their attention, how a scene was perceived etc. Picture viewing and picture description In my dissertation (Holsanova 2001), I investigated visual scanning and simultaneous verbal description of complex scenes. The goal was to compare the contents of verbal and visual 'attentional spotlight' and to answer the following questions: Can we identify comparable units in visual perception and in language production? Does the order of units in the verbal description reflect the general order in which information was acquired visually? Is the content of the units in picture viewing and picture description similar? In order synchronise the verbal and the visual streams, I created an analytic format called multimodal time-coded score sheets. With the help of this new analytic format, I could analyse temporal and semantic relations between speech and gaze and extract configurations of verbal and visual clusters from the synchronized data. (Concerning the results of my studies, consult http://lucs.lu.se/jana.holsanova). The method to combine eye tracking and verbal protocols in order to get an enhanced picture about attentional processes have been further developed in our recent study (see The dynamics of perception and production during text-writing). Newspaper and net paper reading Eye tracking has also been used in applied studies of readers' interaction with newspaper and net paper layout. For designers, the most interesting issues are entry points (where do the readers start reading?), reading paths (how do the readers navigate through the medium?), reading depth (how carefully do they read the articles?) and local design factors (the effect of color, pictures, headlines, drop quotes etc.). Two behaviors can be distinguished: the reading behavior (a well defined movements across the text) and the scanning behavior (large saccades in almost any direction when the reader is evaluating articles: are they worthy of deeper processing?). We conducted several applied studies on newspaper and net paper reading (see references) but apart from the issues mentioned above, we also wanted to understand some of the underlying rationale and motivation of the behaviour. We were interested in readers' reflections, experiences, comments and attitudes towards the new medium. To achieve this aim, we used a combination of three methods: (i) eye tracking, (ii) retrospective verbal protocols supported by the replay of the interaction and (iii) interview data. Let us review some of the main questions and answers. How do readers interact with netpapers compared to newspapers? Does the new medium influence our way of searching for news? How do readers orient, navigate, which reading paths and entry point do they choose? Which are their attitudes towards the medium? Our results show that in fact net paper readers scan more and read less than newspaper readers. We furthermore investigate whether this result can be explained by the difference in layout, navigation structure and purpose of reading between the two media. The reading and interaction patterns in the two media differed. Net paper readers seemed to have problems with orientation in the hyper-structure. Below, you find some of the comments that the participants made on their interaction and that could be used as explanations of the patterns found on the basis of eye tracking. Reading situation, purpose of reading “I just have a quick look at the net paper during a break to see what has happened.” “It's a nice relaxation to spend time reading a newspaper over a cup of coffee.” Navigation ”I start from the first page to get an overview, but I have problems getting back.” Entry points ”I think it’s difficult to decide whether to enter an article on the basis of the headline only. It easily happens that you misunderstand it.” ”The content links were messy and it took some time to scan them through in order to find what to read.” Topic spectrum ”I think net papers are good because it is easy to compare different news sources and how they report about the same news.” The reading situation and purpose of reading vary a lot between the different media. The comments on navigation confirmed the patterns in reading paths net paper readers have difficulties orienting and navigating. In addition, switching to another story is very slow in net paper reading compared to newspaper reading. While it takes only 20-50 ms to move the eyes to another article on the newspaper fold, for net paper readers it takes many times longer. They have to determine which article they want to read by reading content summary (link), they have to click on the link and wait for the article page to be downloaded. The feed back is very slow. Since the investment of switching story is such a high threshold, the readers rather avoid switching between stories. Once they decided to read a story, they read it to the end. To sum up, newspaper reading is characterised by broad topic spectrum and shallow reading. Net paper reading, on the contrary, is characterised by narrow topic spectrum and deep reading. For more results consult e.g. Holmqvist et al. (2003) and Holsanova & Holmqvist (2004). Testing theoretical frameworks Researchers within the sociosemiotic framework have suggested how a newspaper layout could be analysed (Kress & van Leeuwen 1990, 1996, 1998). This analytical framework has also been applied for the Internet (cf. Karlsson & Ledin, 2000). In one of our studies, we decided to test some of Kress & van Leeuwens' hypotheses about reading of newspaper layout against data from readers' authentic interaction with the newspaper. In particular, we used eye tracking to empirically test hypotheses about entry points and reading paths. First, newspaper layout was analysed according to the sociosemiotic approach (without any knowledge of the actual reading behaviour). Second, eye movement data on the newspaper fold was analysed in three different ways: i) the temporal order of the attended areas was calculated in order to determine reading priorities; ii) the amount of time spend on different areas was calculated in order to determine which areas have been read most; iii) finally, reading depth was calculated in order to determine how carefully those areas have been read. The results show that the reading behaviour is very dynamic. Readers register the units predicted by the sociosemiotic theory but the reading paths are created in very different ways (cf. Holsanova, Rahm & Holmqvist 2003, accepted). Dynamics of perception and production in on-line text writing The process of visual information gathering and on-line text production was studied in a current project in co-operation between researchers from Cognitive Science Dept. and Linguistics Dept. at Lund university (project leaders: Sven Strömqcist and Kenneth Holmqvist). Our project team collected data from 96 participants (balanced for age, gender and dyslectics/controls) and used an extended methodology: production-rate data from keystroke logging (ScriptLog), eye-tracking data (iView) and follow up debriefing interviews supported by the playback of the interaction. For analysing the interaction between writing and gaze behaviour, we have developed an analytical tool (Andersson et al., accepted), inspired by the multimodal time-coded score sheets developed by Holsanova (2001). The tool helps with analysing temporal and semantic synchrony in picture viewing and picture description. This offers an enhanced picture of the attentional processes: which objects or areas were scanned visually and which objects or areas were described verbally at a certain point of time. When investigating picture descriptions, we can see how the writers' attention is distributed between the stimulus picture, keybord, computer monitor and elsewhere during writing. In sum, this analytic format gives deeper insights into the dynamics of perception and production during on-line text writing. Integration and interpretation of text and pictures Multimodality has been studied within many different disciplines: semiotics, sociosemiotics, textlinguistics, interface design, and human-computer-interaction (cf. Holsanova 1996, for an overview, cf. Holsanova 1999, 2002). However, very few studies concern the way users perceive the interplay between the modalities: From which source do users' acquire information and how do they integrate it? Some first steps have been done with reading of diagrams and within applied studies of advertisements (Rayner et al. 2001). The process of semantic interpretation during picture elicited writing will also be an important contribution to this topic (Holsanova 2003). Another recent study at the LUCS Eye Tracking Lab is the analysis of reading behavior concerning infographics. The reading paths between headline, text and infographics can give us hints about the integration and interpretation processes. To sum up, perception of multimodality is strongly understudied and deserves to be focused on more extensively in the future research. Summary and conclusions Gaze behaviour reveals the path through the text, paths through the picture and the connections created between picture and text. It can be used to track attentional processes. If eye-tracking data is supplemented by simultaneous or retrospective verbal protocols, the analysis can give a very detailed view on reader interaction with the media. By combining various methods, such as eye tracking, keystroke logging, retrospektive verbal protocol supported by the replay of the interaction, and interviews, we obtain multiple 'windows into thought and action'. We can track users' navigation and orientation, detect interaction problems, evaluate interface design, test theoretical models against empirical results, investigate the perception of multimodality. It allows us to investigate user behaviour, the rationality behind the behaviour, and users' expectations and attitudes. Visions One of the visions is to to track natural integration of different modes of communication (speaking, pointing, scanning pictures, writing text, etc.). When we have two or more continous streams of behavior, how does the semantic unification process look like? Where and when do we integrate those streams? Are they overlapping or sequential? Which are the meaninful units of our interaction? Oviatt (1999) stresses the need of guidance from cognitive science on co-ordinated patterns of human perception and production based on empirical evidence. The clusters and integration patterns discovered in such kind of empirical studies could be used in developing a new generation of multimodal interactive systems within HCI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-modal Interactions for Inconspicuous Mobile Eye-Tracking

Multi-modal interactions have long been explored as a way to improve user experience. With the development of mobile eye-tracking technology, the advantages of using multiple input modes can be applied to mobile eye-tracking applications. The paper introduces a platform for development of mobile eye tracking called Ogle; along with a key feature it possesses the ability to develop multimodal in...

متن کامل

Cell tracking using multimodal imaging.

In vivo imaging plays a key role in cell tracking, particularly for the optimization of cellular therapeutics. A recent trend is to use more than one imaging modality (multimodality imaging) for this purpose. There are several advantages to multimodal cell tracking, particularly the corroboration of data obtained using a new imaging agent or technique with an established one, and the ability to...

متن کامل

The Vernissage Corpus: a Multimodal Human-robot-interaction Dataset

We introduce a new multimodal interaction dataset with extensive annotations in a conversational Human-RobotInteraction (HRI) scenario. It has been recorded and annotated to benchmark many relevant perceptual tasks, towards enabling a robot to converse with multiple humans, such as speaker localization, key word spotting, speech recognition in audio domain; tracking, pose estimation, nodding, v...

متن کامل

A Multimodal Interface for Presenting and Handling Heritage Artefacts

A multimodal interface for presenting and handling heritage artefacts is presented. Traditional exhibitions in museums provide limited possibilities of interaction between the visitor and their artefacts. Usually, interaction is confined to reading labels with little information on the exhibits, shop booklets and audio guided tours. These forms of interaction provide minimal information and do ...

متن کامل

DIRECT-INFO: A Distributed Multimodal Analysis System for Media Monitoring Applications

DIRECT-INFO aims to create a basic system for semi-automatic sponsorship tracking in the area of media monitoring. Its main goal is to offer an integrated system combining the output of basic media analysis modules to semantically meaningful trend analysis results, which shall give executive managers and policy makers a solid basis for their strategic decisions. In this paper we put a special e...

متن کامل

Design and Evaluation of a Gesture Controlled Singing Voice Installation

We present a system that allows users to experience singing without singing using gesture-based interaction techniques. We designed a set of body-related interaction and multimodal feedback techniques and developed a singing voice synthesizer system that is controlled by the user’s mouth shapes and arm gestures. Based on the adaption of a number of digital media-related techniques such as face ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004